Validation of a genome-wide polygenic score in improving fracture risk assessment beyond the FRAX tool in the Women’s Health Initiative study

Background Previous study has established two polygenic scores (PGSs) related to femoral neck bone mineral density (BMD) (PGS_FNBMDldpred) and total body BMD (PGS_TBBMDldpred) that are associated with fracture risk. However, these findings have not yet been externally validated in an independent cohort. Objectives This study aimed to validate the predictive performance of the two established PGSs and to investigate whether adding PGSs to the Fracture Risk Assessment Tool (FRAX) improves the predictive ability of FRAX in identifying women at high risk of major osteoporotic fracture (MOF) and hip fractures (HF). Methods The study used the Women’s Health Initiative (WHI) cohort of 9,000 postmenopausal women of European ancestry. Cox Proportional Hazard Models were used to assess the association between each PGS and MOF/HF risk. Four models were formulated to investigate the effect of adding PGSs to the FRAX risk factors: (1) Base model: FRAX risk factors; (2) Base model + PGS_FNBMDldpred; (3) Base model + PGS_TBBMDldpred; (4) Base model + metaPGS. The reclassification ability of models with PGS was further assessed using the Net Reclassification Improvement (NRI) and the Integrated discrimination improvement (IDI). Results The study found that the PGSs were not significantly associated with MOF or HF after adjusting for FRAX risk factors. The FRAX base model showed moderate discrimination of MOF and HF, with a C-index of 0.623 (95% CI, 0.609 to 0.641) and 0.702 (95% CI, 0.609 to 0.718), respectively. Adding PGSs to the base FRAX model did not improve the ability to discriminate MOF or HF. Reclassification analysis showed that compared to the model without PGS, the model with PGS_TBBMDldpred (1.2%, p = 0.04) and metaPGS (1.7%, p = 0.05) improve the reclassification of HF, but not MOF. Conclusions The findings suggested that incorporating genetic information into the FRAX tool has minimal improvement in predicting HF risk for elderly Caucasian women. These results highlight the need for further research to identify other factors that may contribute to fracture risk in elderly Caucasian women.


Introduction
Osteoporosis is a common bone disease that increases predisposition to fractures [1]. Worldwide, osteoporotic fractures have become a critical public health issue with a high post-fracture disability and mortality rate, resulting in social and economic burdens [2]. Early detection of the high-risk population could lead to efficacious preventive and therapeutic interventions.
Prior studies have demonstrated the polygenic nature of fractures [3][4][5][6]. The predisposition to osteoporotic fracture is attributable to a complex interaction between genetic and nongenetic factors [7]. Many assessment tools have been developed to identify susceptible individuals with a higher propensity to get clinically relevant fractures that merit an intervention [8][9][10]. However, the personalized genetic predisposition of experiencing a fracture was not incorporated into any of those tools. The Fracture Risk Assessment Tool (FRAX) is the most widely used risk stratification tool in North America. FRAX was used to assess the 10-year probability of major osteoporotic fracture (MOF) and hip fracture (HF) on an individual level using 12 clinical risk factors [10]. Nonetheless, the performance of FRAX in terms of fracture discrimination is unsatisfactory [11][12][13].
In the past decade, advanced genome-wide association studies (GWAS) have identified millions of common genetic variants associated with either fracture or fracture-related traits [14][15][16]. As a highly heritable (50-85%) skeletal measure [3] that predicts fracture risk, bone mineral density (BMD) has been comprehensively investigated in several GWASs, with many common genetic variants been discovered [14][15][16]. Moreover, extensive cohort resources have enabled the genetic prediction of such heritable clinical risk factors from genotypes through polygenic scores (PGS), which capture information from many single nucleotide polymorphisms (SNPs) assayed from genome-wide genotyping. We previously developed and validated two genome-wide PGSs related to the femoral neck (PGS_FNBMD ldpred ) and total body BMD (PGS_TBBMD ldpred ) in the UK Biobank (UKB) cohort [17]. Both genome-wide PGS was proven to be significantly associated with incident fracture risk, even after accounting for FRAX clinical risk factors [17] However, the UKB participants were generally younger and healthier than the general population. Our prior findings thus lack generalizability outside of the UKB cohort. Also, since not all 12 clinical risk factors in FRAX were available in the UKB, a comprehensive evaluation of PGS with complete adjustment was not conducted.
The objective of this study was twofold: firstly, to conduct a comprehensive validation of the predictive power of two previously established genome-wide PGSs in an external cohort and, secondly, to develop a PGS for BMD by combining the information of both PGS_FNBMD ldpred and PGS_TBBMD ldpred using a meta-analytic strategy. The study aimed to assess the PGSs' ability to stratify fracture risk and to determine if combining PGSs with FRAX would enhance the accuracy of identifying individuals at high risk of osteoporotic fracture.

Study cohort
The Women's Health Initiative (WHI) study is a nationwide health study aimed at preventing heart disease, breast cancer, and osteoporotic fractures in postmenopausal women [18]. In this study, we used genotype and phenotype data of four WHI sub-studies (the WHI Genomics and Randomized Trials Network (GARNET), the National Heart Lung and Blood Institute (NHLBI), the Population Architecture using Genomics and Epidemiology (PAGE), and the Women's Health Initiative Memory Study (WHIMS)) acquired through the Database of Genotype and Phenotype (dbGap). As the genetic models were primarily developed and trained using samples of European ancestry, we limited this study to only include Caucasian women to ensure a relatively homogeneous ancestry. Participants who were taking medication that affected BMD (n = 0) and participants who did not have complete information regarding FRAX risk factors were excluded from this study (n = 797). Overall, the data analysis included 9,203 eligible women.

Ethics approval
This study was approved by the Database of Genotype and Phenotype (dbGap) (http://www. ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?studyid=phs000200.v12.p3) and the institutional review board at the University of Nevada, Las Vegas. The data was fully anonymized before we accessed them, and UNLV IRB waived the informed consent.

Fracture and outcome ascertainment
The primary outcome of this study was MOF, which were defined as a composite of hip, humerus, forearm, and clinical vertebral fractures in accordance with the FRAX definition. The follow-up time for WHI participants was calculated from the date of their baseline visit until the occurrence of the first fracture or until the subject's death. Self-reported fractures were identified by questionnaires. People who did not experience a fracture or death were followed up until the end of the WHI main study, was 12 years after enrollment.

Clinical risk factors ascertainment
Information regarding age, race, exercise, smoking status, alcohol intake, previous fragility fractures, familial history of fracture, frequency of falls, medication use were collected at baseline using a standard medical questionnaire. Heavy drinking was defined as consume more than three alcoholic drinks per day. Smoking status was categorized following the American Heart Association (AHA) guidelines as "never", "past", and "current". Height was measured in centimeters in standing position, and weight was measured in kilograms using a balance beam.

Genotype imputation
The genotype data used in this study were obtained from blood samples and acquired through dbGap. Genotyping was performed using either the Illumina (Illumina, San Diego, California) or Affymetrix 6.0 Array Set Platform (Affymetrix, Santa Clara, California). Genotype imputation was carried out using the Sanger Imputation Server, employing the Haplotype Reference Consortium (HRC) reference panel and the Positional Burrows-Wheeler Transform (PBWT) imputation algorithm.

Polygenic scores
PGS_FNBMD ldpred and PGS_TBBMD ldpred were quantified using LDpred2 with optimal hyperparameters determined previously [17]. For femoral neck BMD and total body BMD, ρ thresholds of 0.03 and 0.13, respectively, were used to derive the genome-wide PGSs for each individual in the WHI cohort. Since the PGSs were BMD-related, greater PGS is projected to be associated with higher BMD and lower fracture risk.
To construct the metaPGS for BMD, we used the existing PGS_FNBMD ldpred and PGS_TBBMD ldpred scores. The metaPGS was calculated as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where Z i1 ,Z i2 , are the standardized PGS_FNBMD ldpred and PGS_TBBMD ldpred for the ith individual, respectively. β 1 and β 2 are the univariate log odds ratios for each score (estimated using logistic regressions), and ρ 1,2 is the Pearson correlation between the ith and jth scores.

Statistical analysis
The PGS_FNBMD ldpred and PGS_TBBMD ldpred were standardized with a mean of zero and a standard deviation (SD) of one to enable a standardized comparison of effects. The study categorized participants into three groups based on the percentile distribution (�10%, 10-90%, and >90%) of each PGS to show the cumulative fracture incidence in individuals with distinct genetic profiles. The observed 10-year incidence of MOF by PGS groups was calculated using the cumulative incidence function (CIF), accounting for the competing mortality risk.
To investigate whether adding PGS would improve the predictive ability of FRAX, we formulated four models: (1) Base model: FRAX risk factors; (2) Base model + PGS_FNBMD ldpred ; (3) Base model + PGS_TBBMD ldpred ; (4) Base model + metaPGS. The magnitude of the association between each PGS and MOF/HF risk was assessed using hazard ratios (HRs) and corresponding 95% confidence intervals estimated from the Cox Proportional Hazard Models. Model comparison was performed using the likelihood ratio test. The Cox proportional hazard model's proportionality assumption was visually inspected and examined using the Schoenfeld residual test [19] The linearity assumption was checked using the martingale residual test [20]. All three PGSs (PGS_FNBMD ldpred , PGS_TBBMD ldpred , and metaPGS) were evaluated both as continuous and categorical variables in the survival analyses.
The performance of four models in identifying individuals at risk of sustaining a fracture was evaluated using the Area Under the Curve (AUC) and tested for statistical significance using the DeLong test [21]. The Net Reclassification Improvement (NRI) was used to assess the reclassification ability of each model by estimating the predicted risk of fracture for each individual and categorizing them into three risk groups. High risk was defined as a predicted MOF risk � 20% (HF risk � 3%), while low risk was defined as a predicted MOF risk < 20% (HF risk < 3%), based on the National Osteoporosis Foundation's recommended intervention cutoff [22]. The NRI was used to determine how well the models with PGSs reclassified subjects compared to the base FRAX model. The Integrated discrimination improvement (IDI) was used to measure the direction and extent of the change in the predicted risk. Statistical analyses were conducted in SAS 9.4 (SAS Institute, Inc., Cary, NC, USA).

Sample characteristics
The study analyzed a total of 9,203 women who were followed for an average of 12 years. Out of these, 1,255 (13.6%) women sustained at least one MOF during the follow-up period, with 600 (6.5%) experiencing HF. The average duration of follow-up for women who experienced at least one MOF was 6.7 years. The baseline characteristics of the participants were compared between fracture status groups and presented in Table 1. Older age, lower body mass index and had higher alcohol consumption were associated with a higher likelihood of fracture. Participants who experienced fractures also had a higher prevalence of prior fractures, family history of HFs, and falls in the past 12 months. The distribution of PGS_FNBMD ldpred , PGS_TBBMD ldpred , and metaPGS in the WHI cohort was generally normal.

Association between PGSs and fracture
The Cox Proportional Hazard model showed that when treating PGSs as continuous variables, all PGS_FNBMD ldpred , PGS_TBBMD ldpred , and metaPGS were not significantly associated with MOF or HF after adjusting for FRAX risk factors (Tables 2 and 3). We next used PGS groups   (Table 4). Similar findings with HF outcomes were also observed ( Table 5).

Model evaluation
To evaluate the ability of PGS_FNBMD ldpred , PGS_TBBMD ldpred , and metaPGS to discriminate fractures over FRAX, the study used the concordance index (C-index), as shown in Table 6. The FRAX base model showed moderate discrimination of MOF and HF, with C-index values of 0.623 (95% CI, 0.609 to 0.641) and 0.702 (95% CI, 0.609 to 0.718), respectively. However, no improvement was observed in discriminating MOF and HF when PGSs were added to the base FRAX model. In the reclassification analysis, compared to the FRAX base model, models with PGS_FNBMD ldpred (0.37%, p = 0.33), PGS_TBBMD ldpred (0.5%, p = 0.14), or metaPGS (0.05%, p = 0.99) did not improve the reclassification of MOF ( Table 7). The model incorporated PGS_FNBMD ldpred correctly reclassified five individuals (0.45%) to the high-risk group, and eighteen (0.27%) individuals who did not experience a MOF were correctly reclassified to the low-risk group. For the model including metaPGS, two (0.18%) individuals were correctly reclassified to the high-risk group, and seven (0.11%) women who did not experience a MOF were correctly reclassified to the low-risk group. The continuous NRI revealed that the improvement in MOF reclassification contributed by PGS_FNBMD ldpred , PGS_TBBMD ldpred , and metaPGS overall were 0.63% (p = 0.87), 0.88% (p = 0.81), and 1.54% (p = 0.68), respectively. Better reclassification provided by PGSs was observed in HF prediction. PGS_FNBMD ldpred and metaPGS improved the reclassification of HF significantly by 1.2% (p = 0.04) and 1.7% (p = 0.05). In the context of HF prediction, the FRAX+metaPGS model correctly reclassified 6 out of 600 (1.1%) participants with HF upward to the high-risk group, and 58 out of 8,603 (0.8%) women who did not experience HF were correctly reclassified downward to the low-risk group. Additionally, the inclusion of PGS_TBBMD ldpred to the base FRAX model let to a significant increase in IDI of 1.93% (p = 0.01) for predicting HF. However, the overall improvement in fracture event reclassification provided by the PGSs models was minimal.

Discussion
Implementing individual-level genome-wide PGSs summarizing the underlying genetic predisposition of certain diseases in the clinical setting holds excellent promise. Previously, we developed and internally validated two genome-wide BMD-related PGSs using data from the UKB cohort [17]. Our findings indicated that both PGSs were significantly associated with incident fractures, even after being adjusted for FRAX clinical risk factors [17]. In the current study, we replicated the two BMD-associated PGSs from previous work [17] and additionally derived a third PGS-metaPGS combining the effect of the two established PGSs and evaluated their predictive effect in an independent WHI postmenopausal women sample. We examined whether the PGSs could stratify individuals into different risk strata within this external validation cohort and the predictive ability of each PGS beyond the FRAX tool. Our findings showed that both the BMD-related genome-wide PGSs and the metaPGS did not perform as well and were not significantly associated with fractures in the WHI cohort. Moreover, adding genetic information to the FRAX tool was associated with minimal improvements in predicted probabilities of HF among elderly Caucasian women. These findings were in discordance with our previous research findings [17], of which PGSs calculated based on genome-wide significant loci showed significant association with fractures and provided minor improvement of fracture prediction beyond the base model consisting of convention clinical risk factors. Previous studies have produced mixed results regarding the effectiveness of polygenic scores in improving fracture prediction accuracy. For example, Thao et al. reported that genetic profiling of 63 BMD-related genetic variants could enhance fracture prediction performance when compared to the Garvan fracture risk calculator [23]. Our prior work demonstrated that incorporating genetic information from 81 BMD-related genetic variants could improve fracture prediction performance beyond FRAX [24]. A more recent study generated and validated a genome-wide PGS for speed of sound (SOS) also reported a consistent association with fracture risk [25]. However, Eriksson and colleagues reported only minor improvements in fracture prediction when adding a PGS to a base model consisting of age, height, and weight [26]. The performance of PGS in different cohorts is affected by many factors. The PGSs may have been overfitted to the training sample, meaning that it is too closely tied to the specific individuals and variants used in the training sample. This inconsistency can result in poor performance when applied to a validation sample. Compared to the UK Biobank genotype data, WHI genotyping data has fewer SNPs and less genome coverage. The hyper-parameters tuned in UKB might not be as optimal in WHI due to heterogeneity between the training and testing cohorts. Also, the allele frequencies of SNPs will vary by population, together with the causal variants and their effect sizes [27,28]. Moreover, genotyping imputation is one way to introduce variability in calculated PGSs at the individual level without changing the underlying genetic model [29]. The UKB carried out imputation on the genotype data using SHAPEIT3 and IMPUTE4, whereas the WHI was imputed using Positional Burrows-Wheeler Transform (PBWT) imputation algorithm. Lastly, while genetics plays a role in determining traits and conditions, other factors such as the environment, lifestyle, and sociodemographic characteristics can also influence the expression of these traits. So, even individuals with similar ancestry may have different risks for certain conditions based on these other factors, affecting the accuracy of genetic risk predictions [30].
This study comprehensively validated the predictive power of two previously established genome-wide PGSs, as well as a metaPGS that combined information from these two PGSs. Additionally, we assessed the ability of PGSs to stratify fracture risk and to determine if combining PGSs with FRAX would enhance the accuracy of identifying individuals at high risk of osteoporotic fracture. It is essential to acknowledge the limitations of this study. First, the sample size of current study is relatively small to replicate findings discovered in a cohort of half million (UKB); Second, fracture risk of WHI may not be fully captured by the PGS calculated using the hyper-parameters derived in other cohorts. Finally, our study only included individuals of European ancestry, which may limit the generalizability of our findings.
Early detection of high-risk individuals could lead to efficacious preventive and therapeutic interventions. However, based on the hyper-parameters derived in the UKB, we could not replicate our prior findings in this external independent WHI cohort. The two BMD-related genome-wide PGSs and the metaPGS were not significantly associated with fractures in the WHI cohort. Adding genetic information to the FRAX tool was associated with minimal improvements in predicted HF probabilities among elderly Caucasian women.